Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
نویسندگان
چکیده
Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and current practice striking that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes batch sizes, regimes. Despite appealing results, divisive strategies provide little, if any, insight to practitioners select work broadly without tweaking hyperparameters. In this work, blending “geometrization” technique introduced by [L. Lei M. I. Jordan, Proceedings 20th International Conference on Artificial Intelligence Statistics, 2017, pp. 148--156] SARAH algorithm Nguyen, J. Liu, K. Scheinberg, Takáč, 34th Machine Learning, 2613--2621], we propose geometrized for nonconvex finite-sum stochastic optimization. Our proved achieve adaptivity both magnitude target accuracy Polyak--Łojasiewicz (PL) constant, present. addition, it achieves best-available convergence rate non-PL objectives simultaneously while outperforming existing PL objectives.
منابع مشابه
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provi...
متن کاملStochastic Recursive Gradient Algorithm for Nonconvex Optimization
In this paper, we study and analyze the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses. We provide a sublinear convergence rate (to stationary points) for general nonconvex functions and a linear convergence rate for gradient dominated functions,...
متن کاملFast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tack...
متن کاملBlock stochastic gradient iteration for convex and nonconvex optimization
The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions or solve a stochastic optimization problem, very quickly to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are (much...
متن کاملMini-batch stochastic approximation methods for nonconvex stochastic composite optimization
This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SIAM journal on mathematics of data science
سال: 2022
ISSN: ['2577-0187']
DOI: https://doi.org/10.1137/21m1394308